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We present a method to discretize the Kohn-Sham Hamiltonian matrix in the pseudopotential 
framework by a small set of basis functions automatically contracted from a uniform basis set such 
as planewaves. Each basis function is localized around an element, which is a small part of the 
global domain containing multiple atoms. We demonstrate that the resulting basis set achieves 
meV accuracy for 3D densely packed systems with a small number of basis functions per atom. The 
procedure is applicable to insulating and metallic systems. 
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I. INTRODUCTION 

Kohn-Sham density functional theory (KSDFT^is the 
most widely used electronic structure theory for con- 
densed matter systems. When solving the Kohn-Sham 
equations, the choice of basis functions usually poses a 
dilemma for practitioners. The accurate and system- 
atically improvable basis functions that are uniform in 
space, such as plane waves or finite elements, typically 
result in a large number of degrees of freedom (500 ~ 
10000) per atom in the framework of norm conserving 
pseudopotentiaP especially for transition metal elements. 
The number of basis functions per atom can be reduced 
to the order of hundreds using ultrasoft pseudopotential 3 , 
or augmentation techniques in the core-region such as the 
linearized augmented plane- wave (LAPW) method 4 and 
the projector augmented wave (PAW) method 5 . The rel- 
atively large number of basis functions used leads to a 
large prefactor in front of the already expensive cubic 
scaling for solving KSDFT. 

Contracted basis functions, such as Gaussian type or- 
bitals, atomic orbitals or muffin-tin orbitals, can repre- 
sent the Kohn-Sham orbitals with a small number of de- 
grees of freedom per atom (4 ~ 100). These contracted 
basis functions contain a number of parameters to be 
determined. The flexibility for choosing different forms 
of parameters has generated a vast amount of literature 
(see e.g. Refs. I6HTT]) in the past few decades, which has 
been reviewed recently in Ref. [T2l The parameters in 
the contracted basis functions are typically constructed 
by a fitting procedure for a range of reference systems. 
The necessary inclusion of polarization basis^, diffuse 
basis 021, multiple radial functions for each angular mo- 
mentum (multiple ( basis 8 ) are just a few examples when 
the choice of basis functions becomes difficult and system 
dependent especially for complex systems. 

It is desirable to combine the advantage of uniform 
basis set in which the accuracy is controlled by no more 
than a handful of universal parameters for almost all ma- 
terials, and the advantage of contracted basis functions 
with a very small number of basis functions per atom. In 
other words, we would like to generate a small number 
of contracted basis functions by a unified procedure with 



high accuracy comparable to that obtained from uniform 
basis functions. In a recent workP^, we have developed a 
unified method for constructing a set of contracted basis 
functions from a uniform basis set such as planewaves in 
the pseudopotential framework. The new basis set, called 
adaptive local basis (ALB) set, is constructed by solving 
the Kohn-Sham problem restricted to a small part of the 
domain called element. Each ALB is discontinuous from 
the perspective of the global domain, and the continuous 
Kohn-Sham orbitals are approximated by the discontin- 
uous ALBs under a discontinuous Galerkin framework^. 
It was demonstrated that the ALBs are able to achieve 
high accuracy (in the order of 1 meV) using disordered 
Na and Si as examples. However, the number of basis 
functions per atom increases with respect to dimension- 
ality. For example, 40 basis functions per atom is needed 
to reach the accuracy of 1 meV/atom for 3D bulk Na 
system. 

In this paper, we propose a new basis set that is con- 
structed from linear combination of adaptive local ba- 
sis functions. Each new basis function, dubbed element 
orbital (EO), has localized nature around its associated 
element of the domain. The number of EOs used is sig- 
nificantly reduced compared to the number of ALBs for 
3D bulk systems. We demonstrate that 4 EOs per atom 
are sufficient to achieve 1 meV per atom accuracy for 3D 
bulk Na system with disorderedness. We also apply EOs 
to study Na, Si and graphene, with varying system sizes, 
lattice constants or types of defects. The new method 
consistently achieves meV accuracy for calculating the 
total energy when compared to standard electronic struc- 
ture software such as ABINIT^. Since the EOs are con- 
tracted from a uniform basis set such as planewave basis 
set, the shape of the EOs has more flexibility to reflect the 
environmental effect than contracted basis sets which are 
centered around atoms. Numerical result indicates that 
the shape of EOs can resemble both atomic orbitals of dif- 
ferent angular momentum and chemical bonds centered 
in the interstitial region, depending on their chemical en- 
vironment. 

We remark that the construction of the EOs is closely 
related to several existing techniques for reducing the 
number of basis functions per atom, starting from a 
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large primitive basis set consisting of Gaussian orbitals or 
atomic orbitals ^ 7 * 18 * 19 *. However, the EOs are contracted 
from a fine uniform basis set such as planewaves, and 
a number of difficulties arise which makes the previous 
techniques difficult to be directly applied. For instance, 
the filtration technique in Ref. I18|19l constructs a near- 
minimal basis set from a large number of Gaussian type 
orbitals by applying a filtration matrix to a set of trial 
orbitals, taken from one or a few Gaussian-type orbitals. 
When the Gaussian-type orbitals are replaced by a fine 
uniform basis set such as planewaves, finding a good set 
of trial orbitals itself becomes a difficult task, and the 
construction of trial orbitals can inevitably introduce a 
set of undetermined parameters, which is not desirable 
in the current framework. 

This paper is organized as follows: Section [IT] intro- 
duces the adaptive local absis functions in the discontin- 
uous Galerkin framework for solving Kohn-Sham density 
functional theory in the pseudopotential framework. The 
construction of the element orbitals is introduced in Sec- 
tion III Section IV discusses briefly the implementation 



procedure of element orbitals. The performance of ele- 
ment orbitals is reported in Section followed by the 
discussion and conclusion in Section IVll 



II. ADAPTIVE LOCAL BASIS FUNCTIONS 

Consider a quantum system with N electrons under ex- 
ternal potential by V ex t m a rectangular domain Q with 
periodic boundary condition. To simplify the equations, 
we ignore the electron spin for now. In Kohn-Sham den- 
sity functional theory at a finite T = l/(fc.B/3j^QI the 
Helmholtz free energy is given by 

^tot = ^tot({^}, {fi}) = \J2^J i v ^(*)i 2 dx 

V ext (x)p(x)dx + J2l^f i \ I b* e (^i{x)Ax\ 2 
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Correspondingly {^(x)} and {fi} are the solutions to 
the minimization problem 



r mm/tot(Wi},{/i}), 



s.t. 



ip*(x)i/jj(x)dx = dij, ij = ,7V. 



(2) 



{fi} G [0, 1] are the occupation numbers which add up 



to the total number of electrons N = J2iLi fi- Here we 
use exchange-correlation functional under local density 
approximation (LDA)E^ anc [ adopt norm conserving 



pseudopotentiaP, with the projection vector of the non- 
local pseudopotential in the Kleinman-Bylander forrrPU 
denoted by {bi(x)}, and 7^ = ±1 is a sign. The number 
of eigenstates N calculated in practice is chosen to be 
slightly larger than the number of electrons N in order 
to compensate for the finite temperature effect, follow- 
ing the criterion that the occupation number f^ is suf- 
ficiently small (less than 10 -8 ). The electron density is 
given by 
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The Kohn-Sham equation, or the Euler-Lagrange 
equation associated with ([2| 



= (-|A + y e ffM + ^7^IM(^l)^ = A^, (3) 

t 

where the effective one-body potential V e f? [p] is 

p(y) 



Ves\p](x) =V ext (x) + 



\x-y 



d?/ + 4c [/>(>)] 



and the occupation numbers {fi}i>\ follow the Fermi- 
Dirac distribution 



l + exp(/3(A;-/i)) 



Here the chemical potential ji is chosen so that ^2f =1 fi = 
N. In each SCF iteration of ([3|, we freeze p and solve for 
the N lowest eigenfunctions {^i(x)} 1<ri<] y. This linear 
eigenvalue problem is the focus of the following discus- 
sion. 

The discontinuous Galerkin (DG) frameworkP^ pro- 
vides flexibility in choosing appropriate basis functions 
to discretize the Kohn-Sham Hamiltonian H[p]. In the 
DG framework, a smooth function delocalized across the 
global domain can be systematically approximated by a 
set of discontinuous functions that are localized in the 
real space. Let T = {Ei,E2,--- ,£^m} be a collection 
of elements, i.e. disjoint rectangular partitions of ^, and 
S be the collection of surfaces {dEk} that correspond to 
each element Ek in T. We associate with each Ek a set 
of orthogonal basis functions {ukj(x)}i<j< J k supported 
in Ek, with the total number of basis functions given by 



M 



k=l 



Under such a basis set, the Hamiltonian is discretized 
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into an N° x N b matrix with entries given by 
H(fe , ,/;fc,j) 

i 

(4) 

where (•, -) r and are inner products in the bulk 

and on the surface respectively, and a > is a fixed pa- 
rameter for penalizing cross-element discontinuity. The 
notations {{•}} and [[•]] stand for the average and jump 
operators across surfaces^. Comparing (|3| and Q, the 
new terms involving the average and jump operators can 
be derived from integration by parts of the Laplacian op- 
erator, and provide consistency and stability of the DG 
method^. 




FIG. 1: (color online) Sketch for the construction of 
adaptive local basis functions and element orbitals. 
Each adaptive local basis function is supported in an 
element. Each element orbital is supported in an 
extended element. 

In the work of adaptive local basis set 15 , the functions 
{ u k,j}i<j< j k m each element Ek are determined as fol- 
lows. Let d be the dimension of the system. For each 
(one black box in Fig. [TJ, we define an associated ex- 
tended element Q/c, which includes both Ek and its 3 d — 1 
neighboring elements. Define Hg k [p] to be the restric- 
tion of H[p] to Qk with periodic boundary condition and 
with potential given by the restriction of V^ff[p] to Qk- 
HQ k [p] is then discretized and diagonalized with uniform 
basis functions such as planewaves. We denote the cor- 
responding eigenvalues and eigenfunctions by {^k,j}j>i 
and {<Pfc,j(#)}j>i, respectively, starting from the lowest 
eigenvalue. One then restricts the first Jk functions of 
{(Pk,j(%)}j>i to Ek, where Jk is set to be proportional to 
the number of electrons inside the extended element Qk 
(see the numerical examples for specific choice of Jk). In 
addition, we define for each Ek 

\ C k = \k,j k , (5) 



i.e. the largest selected eigenvalue in Ek which shall be 
used later. Applying the Gram-Schmidt procedure to 
{(Pk,j(%)}i<j<j k then gives rise to a set of orthonormal 
functions 

{ukA x )}^<j<j k ( 6 ) 

for each Ek- The union of such functions over all ele- 
ments {^/c,j(^)}i</c<M,i<j<j fc gives the set of adaptive 
local basis functions (ALBs). 

For a given system, the partition of Ek is kept to be 
the same even with changing atomic configurations as in 
the case of structure optimization and molecular dynam- 
ics. Dangling bonds may form when atoms are present 
on the surface of the extended elements, but we empha- 
size that these dangling bonds are not needed to be pas- 
sivated by introducing auxiliary atoms near the surface 
of the extended element^. This is because the poten- 
tial is not obtained self-consistently within Q/c, but in- 
stead from the restriction of the screened potential in 
the global domain Q to Qk in each SCF iteration, which 
mutes the catastrophic damage of the dangling bonds. 
The oscillation in the basis functions caused by the dis- 
continuity of the potential at the surface of the Qk (called 
Gibbs phenomenon) still exists, but it damps exponen- 
tially away from the surface of Qk and has controlled 
effect in Ek- Using disordered Na and Si as examples, we 
demonstrated that ALB can achieve meV accuracy per 
atom using 4 ~ 40 basis functions per atorrP^. 



III. ELEMENT ORBITALS 

The high accuracy of ALBs indicates that the span of 
{ u k,j}i<k<M,i<j<j k approximately contains the span of 
the Kohn-Sham orbitals {V^Ik^at- However, we found 
that the number of basis functions per atom may vary 
significantly with respect to the dimensionality d of the 
system, which has not been seen reported in the literature 
using traditional contracted basis set to the extent of our 
knowledge. 

The dimension dependence of ALBs can be intuitively 
understood as follows, motivated from the success of the 
contracted basis set such as atomic orbitals. Consider 
the case where an atom is positioned at the center of ele- 
ment Ek and assume for simplicity that each of its atomic 
orbital overlaps only with the neighboring elements (i.e., 
those inside the extended element Qk)- In order to in- 
clude one such atomic orbital denoted by rj(x) in the span 
of {u k ,j(x)}i< k <M,i<j<j k , each neighboring element E k > 
in Qk should allocate one of its ALBs for representing 
the restriction of rj(x) in Ek> • This implies that N b , the 
total number of ALBs, should roughly be equal to 3 d 7V, 
which becomes increasingly redundant with respect to 
the dimension d. In fact, this is close to what has been 
observed in the numerical experiments^. 

In order to avoid this redundancy and motivated by 
the construction of atomic orbitals, we propose to build 
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a new basis set by piecing the ALBs in neighboring ele- 
ments {Ek>} in Qk to construct functions that are quali- 
tatively close to the atomic orbit als. To distinguish them 
from the pre-fltted atomic orbitals, we name these func- 
tions element orbitals (EOs). In order to achieve this, 
one is faced mainly with three issues. First, the ALBs 
are always discontinuous across the element boundaries, 
while qualitatively the EOs should be a continuous func- 
tion since the atomic orbitals are continuous. Second, 
when one pieces back the ALBs to obtain the EOs, it is 
essential that the resulting functions have low- energy. Fi- 
nally, one needs to make sure that the EOs of Ek should 
be localized at Ek in order to avoid degeneracy. 

A two-step procedure is proposed to address these 
three issues. In the first step, we construct, for each 
element Ek, a set of candidate functions that take care of 
the first two issues. Then in the second step, we identify 
the element orbitals by localizing the candidate functions. 
More specifically, the method proceeds as follows. 

Let us fix an element Ek- First, since each ALB is 
only supported in its associated element and equal to 
zero outside, we seek for a set of candidate functions for 
element Ek that are linear combinations of the ALBs of 
both Ek and its 3 d — 1 neighbors (Fig. [I]). Denoting by 
X the index of all the ALBs, and by Xk C X the index set 
of ALBs supported in Qk, we define a local Hamiltonian 

Hfc = H(Zfc,Zfc), 

i.e., the restriction of H to the index set Xk. Following the 
intuition that the atomic orbitals should only be affected 
by the local environment of Ek, it is reasonable to assume 
that the low eigenfunctions of serve as good candidate 
functions. Computationally, we diagonalize by 

H k Mk = M k Ak, (7) 

where the diagonal of contains all the eigenvalues 
bounded from above by the cut-off energy A£ given by 
([5]) and the columns of contains the corresponding 
eigenfunctions. The matrix is called the merging ma- 
trix for element Ek- We argue that this step addresses the 
continuity and low-energy issues of the element orbitals 
since the eigenfunctions ^ are qualitatively smooth due 
to the cross-element penalty term of the DG formula- 
tion. Choosing the eigenfunctions below AJ: ensures that 
the candidate functions have low-energy. 

Second, we localize these candidate functions to be 
centered at Ek using a penalizing weight function Wk(x) 
defined for x e Qk- Wk(%) is on ly nonzero in the ex- 
tended element Qk outside a certain distance, called the 
localization radius, from the boundary of Ek (light gray 
area in Fig. [I]). For simplicity we choose Wk(x) = 1 in 
the penalty area and otherwise. More sophisticated 
weighting function as developed for linear scaling meth- 
ods^ and confining potentials as developed for atomic 
orbital^ can be used and optimized for EOs in the fu- 
ture work. A weighting matrix for the adaptive basis 
functions in the index set Xk is defined in the extended 



element Qk by 

\N k (k' ,j'',k" ,3") = (u k >j>,Wk-Uk>>,j>>) T . 

In order to localize the candidate functions, we solve a 
second eigenvalue problem 

(M£W fc M fc )Lfc = UlUkhkVk = L k Vk, 

where M^M^ = X since are orthogonal from The 
columns of L^ and the diagonal of Vk consist of the first 
N% eigenfunctions and eigenvalues, respectively. Here N% 
is the number of element orbitals (EOs) of Ek- As will be 
shown later in the numerical results, a small number of 
EOs per atom already achieve high accuracy in the total 
energy calculation. We call the matrix L^ the localization 
matrix, and the product M^L^ gives the coefficients of 
the EOs in Ek in terms of the ALBs indexed by Xk. In 
order to present these EOs in terms of the whole adaptive 
basis set, we introduce an \X\ x \Xk\ selection matrix 
such that Sk(Xk,Xk) is equal to the identity and all zero 
otherwise. By defining the N b x N% coefficient matrix 
Ck = SfeM/eL/c, we can construct the element orbitals 
associated with Ek by 

0k,l( X ) = ^2 ^,^(^)(Cfc)^;Z, I = 1, • • • ,iVjfe. (8) 

k',j' 

Note that, since these functions are localized in Qk by 
construction, the index k' only runs through the elements 
insides Qk- Finally, the coefficient matrix 

C — (Ci, . . . , Cm) 

gives the whole set of coefficients of the N° = J2^f =1 N% 
element orbitals based on adaptive local basis functions. 
Once the element orbitals are identified, we solve an N° x 
N° generalized eigenvalue problem 

(C*HC)V= (C*C)VA, (9) 

where the diagonal of A gives the Kohn-Sham eigenval- 
ues {Ai} 1<i<7 y and the columns of V provide the coef- 
ficients of Kohn-Sham orbitals in terms of EOs. From 
{Ai} 1<i<7 y, one can calculate the chemical potential \i 
and the occupation number {/i}i<^<7v- Finally, by in- 
troducing the Gram matrix 

G = CV • diag(/ i ) • (CV)', 

we can write p(x) as 

(10) 

where k(x) indexes the element that contains x. Notice 
that one only needs the knowledge of the diagonal blocks 
of the Gram matrix G to construct the electron density. 
This allows us to use the recently developed pole expan- 
sion and selected inversion type fast algorithmic^ to 
reduce the asymptotic scaling for solving the general- 
ized eigenvalue problem Q from cubic scaling to at most 
quadratic scaling for 3D bulk systems. 
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IV. PARALLEL IMPLEMENTATION 

Our algorithm is implemented fully in parallel for 
message-passing environment, based on the implemen- 
tation details presented in Ref. [15j Here we summarize 
the key components of the parallel implementation. 

The global domain is discretized with a uniform Carte- 
sian grid with a spacing fine enough to capture the local 
oscillations of the Kohn-Sham orbitals and the electron 
density. Rather than using the dual grid approach with 
one set of grid for representing the Kohn-Sham wavefunc- 
tions, and another set of denser grid for representing the 
electron density, we only use one set of Cartesian grid 
for both the Kohn-Sham wavefunctions and the electron 
density for simplicity of the implementation. The grid 
inside an element Ek is a three-dimensional Cartesian 
Legendre-Gauss-Lobatto (LGL) grid in order to accu- 
rately carry out the operations of the basis functions such 
as numerical integration. The ALBs are first represented 
in a planewave basis set in each extended element (K 
solved by LOBPCG algorithm 33 with a preconditioned 3 -^, 
and are interpolated to each element Ek and orthogonal- 
ized. The eigenvalue problems involved in constructing 
the EOs are performed by LAPACK subroutine dsyevd. 

To simplify the discussion of the parallel implementa- 
tion, we assume that the number of processors is equal 
to the number of elements. It is then convenient to index 
the processors {Pk} with the same index k used for the 
elements. In the more general setting where the num- 
ber of elements is larger than the number of processors, 
each processor takes a couple of elements and the follow- 
ing discussion will apply with only minor modification. 
Each processor P k locally generates and stores the ALBs 
{ukj(x)} for j = 1, 2, . . . , Jk and the coefficients for the 
EOs {C k;ji i} for j = 1,2,..., J k and I = 1, 2, . . . , N°. 
The EOs {(j)k,i(x)} are not explicitly formed in the real 
space. We further partition the non-local pseudopoten- 
tials {bi(x)} by assigning bi(x) to the processor P k if 
and only if the atom associated to be(x) is located in the 
element E k . 

Since the matrices C and H are sparse, the Hamilto- 
nian matrix C £ HC and the mass matrix C^C in (|9| are 
also sparse matrices. However, these matrices are treated 
as dense matrices in our implementation for simplicity. 
The parallel matrix-matrix multiplication for construct- 
ing C £ HC and C £ C are performed using PBLAS subrou- 
tine pdgemm, and the generalized eigenvalue problem ([9| 
is solved by converting it to a standard eigenvalue prob- 
lem using ScaLAPACK 35 subroutine pdpotrf and pdsygst, 
and the standard eigenvalue problem is solved by ScaLA- 
PACK subroutine pdsyevd. 

In our implementation, the matrices H and C are con- 
structed locally according to the element indices. How- 
ever, the ScaLAPACK routines that operate on H and C 
require them to be stored in the two dimensional block 
cyclic pattern. In order to support these two types of 
data storage, we have implemented a rather general com- 
munication framework that only requires the program- 



mer to specify the desired non-local data. This frame- 
work then automatically fetches the data from the pro- 
cessors that store them locally. The actual communica- 
tion is mostly done using asynchronous communication 
routines MPLIsend and MPLIrecv. 



V. NUMERICAL RESULTS 

The new method is implemented with Hartwigsen- 
Goedecker-Hutter (HGH) pseudopotentiaP^, with the lo- 
cal and nonlocal pseudopotential implemented fully in 
the real space 37 . Finite temperature formulation of the 
Kohn-Sham density functional theor}P^ is used, and the 
temperature is set to be 2000K only for the purpose of 
accelerating the convergence of SCF iteration. Since fi- 
nite temperature is used, the accuracy is quantified by 
the error of the total free energ}f^ per atom. HGH pseu- 
dopotential has analytic expression, which allows us to 
minimize the effect of numerical interpolation and per- 
form accurate comparison with existing electronic struc- 
ture code. We compare our result with ABINIT 17 which 
also supports HGH pseudopotential. The ALBs and EOs 
start from random initial guess, and are refined itera- 
tively in the SCF iteration together with the electron 
density. In all the calculations, Anderson mixin^^ with 
Kerker preconditioned^ are used for the SCF iteration. 
Gamma point Brillouin sampling is used for simplicity. 



In Section \U\ and Section III , we count the number of 



basis functions in terms of the number of ALBs per el- 
ement and the number of EOs per element. In this sec- 
tion, we count the number of ALBs and EOs per atom 
instead, in order to be consistent with literature. All 
computational experiments are performed on the Hop- 
per system at the National Energy Research Scientific 
Computing (NERSC) center. Each Hopper node con- 
sists of two twelve-core AMD "MagnyCours" 2.1-GHz 
processors and has 32 gigabytes (GB) DDR3 1333-MHz 
memory. Each core processor has 64 kilobytes (KB) LI 
cache and 512KB L2 cache. It also has access to a 6 
megabytes (MB) of L3 cache shared among 6 cores. 

As mentioned earlier, the ALBs have been shown to 
achieve effective dimension reduction for quasi- ID sys- 
tems, but with deteriorating performance as the dimen- 
sionality of the system increases^. Using Na as example, 
it has been shown that while 4 ALBs per atom is enough 
to reach 1 meV accuracy for quasi- ID systems, 40 ALBs 
per atom is necessary to reach the same accuracy for 3D 
bulk systems. Now using a 3D bulk Na system with 432 
atoms as example, we illustrate that the number of basis 
functions per atom can be effectively reduced using EO. 

The supercell for Na is simple cubic and the length of 
the supercell along each dimension is 45.6 a.u.. A random 
perturbation with standard deviation 0.2 a.u. is applied 
to each atom in the supercell to eliminate the transla- 
tional invariance of the system. The supercell is parti- 
tioned into 6x6x6 elements, with the length of each 
dimension of each element being 7.6 a.u.. The length 
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of each dimension of each extended element is 22.8 a.u. 
which is 3 times larger than that of the element. The 
penalty parameter a in Q is set to be 100. The su- 
percell is discretized with a uniform mesh of dimension 
120 x 120 x 120 in the real space. This mesh is used 
for representing both the electron density and the Kohn- 
Sham orbit als, which corresponds to a planewave cut- 
off of 68 Ry in the Fourier space. ABINIT uses a dual 
grid for representing the Kohn-Sham wavefunctions and 
the electron density. The planewave cutoff for wavefunc- 
tions used in ABINIT is 20 Ry. This corresponds to a 
planewave cutoff for the electron density at 80 Ry, with 
a uniform mesh of dimension 135 x 144 x 144 in the real 
space. The different numbers of grid points along each 
dimension come from the automatic grid adjustment in 
ABINIT. We remark that the grid size is chosen to be 
larger than the typical setup in electronic calculation for 
Na to make sure that the error introduced by the grid 
size is small compared to that introduced by using ALBs 
and EOs. Inside each element a Legendre-Gauss-Lobatto 
(LGL) grid of dimension 30 x 30 x 30 is used for numeri- 
cal integration in the assembly process of the discretized 
Hamiltonian matrix H. The error of the total free en- 
ergy per atom only using ALBs is shown in Fig. [2] (a). 
The error systematically decreases with the increase of 
the number of ALBs. When the number of ALBs ex- 
ceeds 35, the error of the total free energy per atom is 
less than 1 meV. 

Element orbitals (EOs) provide further dimension re- 
duction compared to ALBs. Fig. [2] (b) shows the dif- 
ference of the free energy per atom calculated from EOs 
and that from ABINIT. We construct EOs from as many 
as 42 ALBs per atom, following the criterion ([5| for the 
choice of the candidate functions and using a localization 
radius of 6.0 a.u.. Compared to a converged ALB calcu- 
lation, the error using only 3 EOs per atom is already 
within 5 meV per atom. When 6 EOs are used, the to- 
tal free energy calculated is essentially the same as that 
using 42 ALBs, and the error compared to ABINIT is 
less than 1 meV per atom. Fig. [2] (b) indicates that the 
EOs are indeed effective for reducing the number of basis 
functions per atom for 3D bulk systems. 

Compared to ALB, the EO approach introduces an ad- 
ditional parameter which is the localization radius. Fig. [2] 
(c) shows the error of the total free energy per atom using 
42 ALBs per atom, and 6 EOs per atom but with differ- 
ent localization radius. When the localization radius is 
4.0 a.u. which is 53% the length of an element, the error 
of the total energy per atom is 7 meV. Moderate choice of 
the localization radius of 6.0 a.u. (69% of the length of an 
element) yields accuracy around 1 meV per atom. Fig. [2] 
(c) shows that our method is stable even for a large local- 
ization radius 7.0 a.u. (92% of the length of an element), 
and the error is even smaller and is below 1 meV per 
atom. We also remark that if the localization radius is 
further increased, the EOs are no longer localized around 
the element, but become fully extended in the extended 
element. This can lead to an unstable scheme with large 
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FIG. 2: (color online) (a) Convergence of adaptive local 
basis functions (ALB) for a 3D bulk Na system with 
432 atoms, (b) Convergence of element orbitals (EO) 
for the same Na system with fixed number of ALBs. (c) 
Convergence in terms of the localization radius for the 
same Na system with fixed number of ALBs and fixed 
number of EOs. 



error. Fig. [2] (c) shows that the accuracy of the EO is 
not very sensitive to the choice of localization radius. 

EOs can resemble atomic orbitals but with local mod- 
ifications reflecting the environmental effect, despite the 
fact that they are constructed in the extended elements 
with rectangular domain. Using the same Na system as 
example, we show in Fig. [3] the isosurface of the first 9 
element orbitals (0i to 0g) belonging to the same ex- 
tended element, with the red and blue color indicating 
the positive and negative part of the EOs, respectively. 
27 atoms nearest to these EOs within a sphere of radius 
6.0 a.u. are also plotted in Fig. [3] as gold balls. We see 
that 0i mimics s orbital, 02-04 mimic p-orbitals, and 
05-^9 mimic d-orbitals. Both the general shape and the 
multiplicity of the element orbitals agree well with the 
physical intuition. We also find that hybridization of the 
d orbitals naturally appears in the EOs, reflecting the 
effect of the environment. For example, the isosurface of 
0i exhibits "holes" around atoms. These holes are not 
described in the spherical symmetric s atomic orbital, 
but can only be reflected in orbitals of higher angular 
momentum such as d orbitals. Therefore, EOs are nat- 
ural generalization of atom-centered orbitals, with both 
the atomic and environmental effect taken into account 
simultaneously. 

EOs are localized in the extended elements. Since 
each candidate function is not continuous across the 
boundary of the extended element, EOs are still discon- 
tinuous across the boundary of the extended element. 
Nonetheless, the EOs are "qualitatively continuous" at 
the boundary of the extended elements. Fig. [4] (a) shows 
the behavior of 01,04,07 for the Na system along one 
[100] direction, with the zoom-in near the boundary of 
the extended element shown in Fig. |4](b). EOs are very 
close to a continuous function especially for X and 04 
with lower angular momentum. The value of EOs of 
higher angular momentum such as 07 at the grid point 
closest to the boundary of the extended element is within 
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FIG. 3: The isosurface of the first 9 element orbit als 
belonging to the same extended element, for a 3D bulk 
Na system with 432 atoms. The 27 Na atoms nearest to 

the element orbitals within a sphere of radius 6.0 a.u. 
are plotted as gold balls. The positive and negative part 
of the element orbitals are represented by red and blue 
color, respectively. 



KT 3 . 

EOs can be used for calculating the relative energies 
of different atomic configurations. Fig. [5] (a) shows the 
total free energy per atom for a crystal of Na consisting 
of 6x6x6 = 2 16 unit cells with 432 atoms. Each unit 
cell is body centered cubic with 2 Na atoms. The lattice 
constant ranges from 7.3 a.u. to 7.9 a.u.. The size of each 
element is equal to that of one unit cell. 4 EOs per atom 
are constructed from 42 ALBs per atom and are used for 
calculating the total free energy. The planewave cutoff 
for Kohn-Sham wavefunctions in ABINIT is 20 Ry. The 
difference of the total energy per atom is less than 2 meV 
across all the lattice constants. Similar result can be ob- 
tained for Si. The supercell for Si contains 4 x 4 x 4 = 64 
unit cells with 512 atoms in total. Each unit cell is dia- 
mond cubic with 8 Si atoms. Fig. [5] (b) reports the total 
free energy per atom for lattice constants from 9.9 a.u. to 
10.5 a.u.. Each element only covers | x | x | unit cells. 
We remark that elements occupying a fraction of the unit 
cell are allowed, which is important especially when EOs 
are applied to systems with defects and disorderedness. 
The planewave cutoff for Kohn-Sham wavefunctions in 
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FIG. 4: (color online) (a) The value of the element 
orbitals (blue solid line), ^4 (red dashed line), and <pj 
(black dot dashed line) along one [100] direction of a 3D 
bulk Na system with 432 atoms. The two red circles 
indicate the boundary of the extended element, (b) 
Zoom-in of (a) to the region near the boundary of the 
extended element. The same set of element orbitals (pi 
(blue solid line with circles), ^4 (red dashed line with 
triangles) and (black dot dashed line with diamonds) 
are shown, with the symbols indicating the position of 
the numerical grids. The red circle indicates the 
boundary of the extended element. 

ABINIT is set to be 120 Ry to achieve the high accuracy 
as benchmark solution. The localization radius is also 6.0 
a.u.. Starting from 50 ALBs per atom, 10 EOs per atom 
are computed. The difference of the total free energy per 
atom is less than 1 meV for all lattice constants. 
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FIG. 5: (color online) The total free energy per atom 
for 3D bulk Na system with 432 atoms (a) and 3D bulk 
Si system with 512 atoms (b), with different lattice 
constants calculated from ABINIT and from element 
orbitals. 

EOs are also effective for calculating the total energy 
of systems with defects. For a crystal Na system with 
432 atoms and the length of each dimension of the su- 
percell being 45.6 a.u., the total free energy evaluated 
using ABINIT is -103.27947 a.u.. Using the same setup 
as done in the crystal system with 4 EOs per atom, the 
total free energy evaluated using EO is —103.27588 a.u.. 



The difference is as small as 0.22 meV per atom. Since 
our implementation takes the spin-unpolarized form, we 
consider a system with two vacancies by removing 2 Na 
atoms belonging to one unit cell from the supercell. All 
the parameters are the same as those for the calculation 
of the crystal system. The total free energy evaluated 
using ABINIT is -102.76957 a.u., and the total free en- 
ergy evalauted using 4 EOs per atom is —102.76637 a.u., 
with the difference being 0.20 meV per atom. The error 
for both the crystal and the defect system is less than 1 
meV per atom. We also estimate the formation energy 
of M neutral vacancies by 



AE(M) = E 



N-M 



E% 



N-M 
M ' 



(11) 



with E^ being the free energy for the crystal system with 
N atoms, and Ef^_ M being the free energy for the same 
system but with M atoms removed. Atomic relaxation 
is not taken into account at this stage. Using (11), the 



formation energy calculated from ABINIT is 0.864 eV, 
and that calculated from EO is 0.854 eV. The difference 
of the formation energy is 0.010 eV, and the relative error 
of the formation energy is 1.2%. 

The calculation of the defect formation energy for Si 
is as follows. For a crystal Si system with 512 atoms 
and the length of each dimension of the supercell being 
40.4 a.u., the total free energy evaluated using ABINIT is 
—2030.85824 a.u., and the total free energy evaluated us- 
ing 10 EOs per atom is -2030.85691 a.u.. The difference 
is as small as 0.07 meV per atom. A defect system is con- 
structed by removing one Si atom, and all the parameters 
are the same as those for the crystal calculation. The to- 
tal free energy evaluated using ABINIT is -2026.76478 
a.u., and the total free energy evaluated using 10 EOs 
per atom is 2026.75974 a.u., with the difference being 
0.27 meV per atom. The error for both the crystal sys- 
tem and that for the defect system is less than 1 meV per 
atom. The formation energy calculated from ABINIT is 
3.454 eV, and that calculated from EO is 3.555 eV. The 
difference of the formation energy is 0.101 eV, and the 
relative error of the formation energy is 2.9%. 

Next we study graphene sheet consisting of 32 C atoms 
(cyan balls), with 1 C atom replaced by a Si atom (gold 
ball), as shown in Fig. [6] The length of the supercell 
is 10.000 a.u., 16.108 a.u. and 18.600 a.u. for x,y,z di- 
rections, respectively. The C and Si atoms are in the 
y — z plane. The supercell consists of 4 x 4 elements, 
with each element containing 2 atoms, and represented 
by one black box. The length of each element is therefore 
10.00 a.u., 4.027 a.u. and 4.650 a.u. along x,y,z direc- 
tions, respectively. The shape of the EOs is shown in 
Fig. [6] (a) for the first EOs <j)\ belonging to 2 different 
elements, and (b) for the second EOs <p2 belonging to 
the same 2 elements, respectively. We find that <j)\ in 
the upper element reflects the C-C bond and <pi in the 
lower element reflects the C-Si bond, respectively. Simi- 
larly, <p2 reflects the tt bonds in both the upper and the 
lower elements. The shape of the EOs agree well with the 
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FIG. 6: Graphene sheet consisting of 32 C atoms (cyan 
balls) with 1 C atom substituted by a Si atom (gold 
ball). Each black box represents an element, (a) The 
first element orbital 0i (green) for the upper element 
with 2 C atoms, and the first element orbital (red) 
for the lower element with 1 C atom and 1 Si atom, (b) 
The second element orbital (j)2 (green for the positive 
part and black for the negative part) for the upper 
element with 2 C atoms, and the second element orbital 
<p2 (red for the positive part and blue for the negative 
part) for the lower element with 1 C atom and 1 Si 
atom. 



physical intuition. In particular, the element orbitals are 
not centered around individual atoms but correspond di- 
rectly to chemical bonds, which are of lower energy than 
individual atomic orbitals. Fig. [6] shows that the EOs 
constructed from a complete basis set such as planewaves 
provides a more flexible treatment of chemical environ- 
ment than atom centered orbitals. The total free energy 
calculated using ABINIT with a planewave cutoff at 200 
Ry is —180.56324 a.u.. 12 EOs per atom contracted from 
40 ALBs per atom with localization radius being 3.0 a.u.. 
The total free energy calculated using EO is —180.56279 
a.u.. The difference in the total free energy per atom is 
0.38 meV. 

A more complicated example is a graphene sheet with 
512 C atoms, and with 128 of the C atoms randomly se- 
lected and replaced by Si atoms. The atomic configura- 
tion is shown in Fig. [7] (a), with the C atoms represented 
by cyan balls and Si atoms represented by gold balls, re- 
spectively. The atoms are all in the y — z plane, and the 
dimension of the supercell is 10.000 a.u., 64.432 a.u. and 
74.400 a.u. along x, y, z directions, respectively. The elec- 
tron density in the y — z plane is shown in Fig.[7|(b). The 
total free energy calculated from ABINIT is -2639.02487 
a.u., and the total free energy calculated from EO with 
12 EOs per atom for all elements is —2639.11504 a.u.. 
The error of the total free energy per atom is 4.79 meV 
per atom. 

The fact that a small number of EOs per atom already 
achieve high accuracy allows us to perform calculations 
for systems of large size. Here we study 3D bulk Na sys- 
tems of various sizes, ranging from 128 atoms to 4394 
atoms. The length of the supercell along each dimension 
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the rest of the computational time (classified as "other 
time" in Fig. [8| mainly includes constructing the elec- 
tron density using (10) in the global domain, solving the 



FIG. 7: (a) The atomic configuration of a graphene 
sheet consisting of 512 C atoms (cyan balls), with 128 C 

atoms randomly selected and substituted by Si atoms 
(gold balls), (b) The electron density across y — z plane. 



is also proportional to the system size, from 30.4 a.u. 
for 128 atoms to 98.8 a.u. for 4394 atoms. The number 
of processors (computational cores) used is chosen to be 
proportional to the number of atoms, with 64 processors 
used for 128 atoms, and 2196 processors used for 4392 
atoms. 4 EOs per atom are constructed from 42 ALBs 
per atom for all calculations. The total time per SCF it- 
eration is shown in Fig. [8] We find that even though the 
number of atoms increase by a factor of 34, the wall clock 
time only increases by less than 4 times from 114 sec for 
128 atoms to 413 sec for 4394 atoms. The small increase 
of the total wall clock time is because the time for solving 
the generalized eigenvalue problem (|9|, which is asymp- 
totically the computationally dominating part, only takes 
less than 100 sec even for system as large as 4392 atoms, 
thanks to the small number of basis functions per atom 
allowed to be used in the calculation. The time for gen- 
erating the ALBs using LOBPCG and the time for con- 
structing the EOs from the ALBs are flat for all systems, 
since these steps are localized in each extended element 
and the computational cost is independent of the global 
system size. The overall time for solving the generalized 
eigenvalue problem (|9| has not dominated the computa- 
tional time for 4392 atoms with a Hamiltonian matrix of 
size 17568. However, the wall clock time for this part al- 
ready scales quadratically with respect to the number of 
atoms. Since the number of processors scale linearly with 
respect to the system size, the overall time for solving the 
generalized eigenvalue problem scales cubically with re- 
spect to the system size, and will eventually dominate 
the overall running time for systems of larger size. The 
overhead of the DG calculation involves the assembly of 
the DG matrix H, the construction of the Hamiltonian 
matrix C^HC and the mass matrix C^C using parallel 
matrix-matrix multiplication, as well as the communi- 
cation time. As alluded to earlier, the parallel matrix- 
matrix multiplication treats C and H as dense matrices 
in the current implementation. Therefore the asymptotic 
scaling of this part has the same asymptotic cubic scal- 
ing as solving the generalized eigenvalue problem. All 



Kohn-Sham potential from the electron density, charge 
mixing as well as the extra data communication. 
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FIG. 8: (color online) The total computational time per 
SCF iteration (red solid line with upward-pointing 
triangles) for 3D bulk Na systems ranging from 128 
atoms to 4394 atoms. The breakdown of the total 

computational time includes the time for using 
LOBPCG to generate adaptive local basis functions 

(blue dashed line with diamonds), the time for 
constructing the element orbitals from adaptive local 
basis functions (black dot dashed line with circles), the 

time for solving the generalized eigenvalue problem 
using dense ScaLAPACK solver (green solid line with 
left-pointing triangles), the overhead time for solving 

the DG problem (magenta dashed line with 
right-pointing triangles), and the rest of the time in a 
SCF iteration (cyan dot dashed line with stars). 



We also remark that treating the Hamiltonian matrix 
as dense matrices greatly increases the memory cost and 
the communication volume. Fig. [9] (a) shows the amount 
of memory used per processor. When the number of 
atoms is 4394, the memory used per processor is 5.5 
GB, which becomes the bottleneck for further increasing 
the system size, despite that the computational time per 
SCF is still within affordable range. The communication 
volume, indicated by the percentage of the communica- 
tion time within the total computational time is shown in 
Fig. [9] (b) . The communication time occupies more than 
40% of the total time for systems with 4394 atoms. Both 
the large memory cost and the large communication vol- 
ume is largely due to the treatment of C and H as dense 
matrices, and shall be improved in the future work. 
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FIG. 9: (color online) The memory cost per processor 
(a) and the communication percentage (b) for 3D bulk 
Na systems ranging from 128 atoms to 4394 atoms. 



VI. CONCLUSION 

In conclusion, we have introduced the element orbitals 
for discretizing the Kohn-Sham Hamiltonian in the pseu- 
dopotential framework, which are contracted automati- 
cally from a uniform basis set. Comparing with the ex- 
isting contracted basis sets, element orbitals incorporate 
environment information by including directly all atoms 
in the neighboring elements on the fly. The implementa- 
tion of element orbitals is straightforward thanks to the 
rectangular partitioning of the domain. The accuracy of 
element orbitals are systematically improvable and the 
same procedure can be applied to systems under vari- 
ous conditions. The element orbitals are constructed by 
solving KSDFT locally in the real space, and localized 
on each element via a localization procedure. We remark 
that the localization procedure used for constructing the 
element orbitals is not grounded on the near-sightedness 
property as in the linear scaling methods for insulating 



system^^J. Instead of finding the compact representa- 
tions for the Kohn-Sham invariant subspaces 43 , the cur- 
rent work seeks for a set of compact basis functions in 
the real space, while the coefficients of the basis set for 
representing the Kohn-Sham orbitals can still be delocal- 
ized. As is shown by the numerical examples, the current 
procedure is applicable to both insulating and metallic 
systems. 

Our numerical examples also indicate that treating C 
and H as dense matrices can greatly increase the memory 
cost, the communication volume and the computational 
time especially for systems of large size. The future im- 
provement includes treating C and H as sparse matrices 
so that the construction of the Hamiltonian matrix C £ HC 
and the mass matrix C £ C is of linear scaling. By treat- 
ing C and H as sparse matrices, we can also incorporate 
the recently developed pole expansion and selected in- 
version type fast algorithms'^ to reduce the asymp- 
totic scaling for solving the generalized eigenvalue prob- 
lem ([9| from cubic scaling to at most quadratic scaling 
for 3D bulk systems. We also remark that the current 
procedure for constructing the orbitals from adaptive lo- 
cal basis functions is still a costly procedure inside each 
element. Method for generating element orbitals directly 
inside the extended element is also under our exploration. 
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